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We describe a simple Importance Sampling strategy for Monte Carlo simulations based on a least 
squares optimization procedure. With several numerical examples, we show that such Least Squares 
Importance Sampling (LSIS) provides efficiency gains comparable to the state of the art techniques, 
when the latter are known to perform well. However, in contrast to traditional approaches, LSIS 
is not limited to the determination of the optimal mean of a Gaussian sampling distribution. As 
a result, it outperforms other methods when the ability to adjust higher moments of the sampling 
distribution, or to deal with non-Gaussian or multi-modal densities, is critical to achieve variance 
reductions. 
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I. INTRODUCTION 

The impressive development of the securities markets 
has generated in the last few years a steady demand for 
more and more structured financial products. At the 
same time, the level of sophistication and complexity of 
the pricing models employed by investment firms has dra- 
matically increased, in a continuous search for a possible 
edge against competitors. As a result, an increasing frac- 
tion of the models employed in practice is too complex to 
be treated by analytic or deterministic numerical meth- 
ods (trees or partial differential equations), and Monte 
Carlo simulation becomes more often than ever the only 
computationally feasible means of pricing and hedging. 

Although generally easy to implement, Monte Carlo 
simulations are infamous for being slow. In fact, being 
stochastic in nature, their outcome is always affected by 
a statistical error, that can be generally reduced to the 
desired level of accuracy by iterating the calculation for 
long enough time. This comes with a high computa- 
tional cost as such statistical uncertainties, all things be- 
ing equal, are inversely proportional to the square root of 
the number of statistically independent samples. Hence, 
in order to reduce the error by a factor of 10 one has to 
spend 100 times as much computer time. For this reason, 
to be used on a trading floor, Monte Carlo simulations 
often require to be run on large parallel computers with 
a high financial cost in terms of hardware, infrastructure, 
and software development. 

Motivated by this very practical necessity, several ap- 
proaches to speed up Monte Carlo calculations, such as 
Antithetic Variables, Control Variates, and Importance 
Sampling, have been proposed over the last few years [l[. 
These techniques aim to reduce the variance per Monte 
Carlo observation so that a given level of accuracy can be 
obtained with a smaller number of iterations. In general, 
this can be done by exploiting some information known 
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a priori on the structure of the problem at hand, like 
a symmetry property of the Brownian paths (Antithetic 
Variables), the value of a closely related security (Con- 
trol Variates) , or the form of the statistical distribution of 
the random samples (Importance Sampling). Antithetic 
Variables and Control Variates are the most commonly 
used variance reduction techniques, mainly because of 
the simplicity of their implementation, and the fact that 
they can be accommodated in an existing Monte Carlo 
calculator with a small effort. However, their effective- 
ness varies largely across applications, and is sometimes 
rather limited [1|. 

On the other hand. Importance Sampling techniques, 
although potentially more powerful, have not been em- 
ployed much in professional contexts until recently. This 
is mainly because such techniques generally involve a big- 
ger implementation effort, and they are also less straight- 
forward to include in a general Monte Carlo framework. 
Moreover, when used improperly. Importance Sampling 
can increase the variance of the Monte Carlo estimators, 
thus making its integration in an automated environment 
more problematic. Nonetheless, the potential efficiency 
gains at stake are so large that the interest in finding 
efficient Importance Sampling schemes is still very high. 

The idea behind Importance Sampling is to reduce the 
statistical uncertainty of a Monte Carlo calculation by 
focusing on the most important sectors of the space from 
which the random samples are drawn. Such regions criti- 
cally depend on both the random process simulated, and 
the structure of the security priced. For instance, for a 
deep out-of-the money Call option 0] , the payoff sampled 
is zero for most of the iterations of a Monte Carlo sim- 
ulation. Hence, simulating more samples with positive 
payoff reduces the variance. This can be done by chang- 
ing the probability distribution from which the samples 
are drawn, and reweighing the payout function by the 
appropriate likelihood-ratio (Radon-Nikodyn derivative) 
in order to produce an unbiased result of the original 
problem [1]. 

Most of the work in Importance Sampling methods 
for security pricing has been done in a Gaussian setting 
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d, H, H, 0, H, 0, such the one arising from the sim- 
ulation of a diffusion process. In this framework, Impor- 
tance Sampling is achieved by modifying the drift term 
of the simulated process in order to drive the Brownian 
paths towards the regions that are the most important 
for the evaluation of the security. For instance, for the 
Call option above, this can be obtained by increasing the 
drift term up to a certain optimal level |3i, [4] . The dif- 
ferent approaches proposed in the literature, essentially 
differ in the way in which such change of drift is found, 
and can be roughly divided into two families depending 
on the strategy adopted. The first strategy, common to 
the so-called adaptive Monte Carlo methods 0, H, H, [13] , 
aims to determine the optimal drift through stochastic 
optimization techniques that typically involve an itera- 
tive algorithm. On the other hand, the second strategy, 
proposed in a remarkable paper by Glasserman, Heidel- 
berger, and Shahabuddin (GHS) Q, relies on a deter- 
ministic optimization procedure that can be applied for 
a specific class of payouts. This approach, although ap- 
proximate, turns out to be very effective for several pric- 
ing problems, including the simulation of a single factor 
Heath- Jarrow-Morton model ?| , and portfolio credit sce- 
narios (Tl| . 




FIG. 1: Sampling probability density functions for a Euro- 
pean Call option with T = 1, r = 0.05, a = 0.3, Xo = if = 50 
as obtained with LSIS [optimizing just the drift, LSIS(/i), and 
both the drift and the volatility, LSIS(/i, o")], and the sad- 
dle point approximation (GHS). On this scale the results for 
LSIS(/i) and GHS are indistinguishable. The original (I13p and 
the optimal ((9} sampling densities are also shown for compar- 
ison. 



II. IMPORTANCE SAMPLING 



In this paper, we propose the Least Squares Impor- 
tance Sampling (LSIS), as an effective and flexible vari- 
ance reduction method for Monte Carlo security pric- 
ing. This approach, originally proposed in Physics for 
the optimization of quantum mechanical wave functions 
of correlated electrons , is applied here in a financial 
setting. In LSIS the determination of the optimal drift - 
or more in general of the most important regions of the 
sample space - is formulated in terms of a least squares 
minimization. This technique can be easily implemented 
and included in an existing Monte Carlo code, and simply 
relies on a standard least square algorithm for which sev- 
eral optimized libraries are available. We will show that 
LSIS provides efficiency gains analogous to the ones of 
previously proposed methods when the latter are known 
to perform well. However, LSIS does not share some of 
their limitations, and its range of applications includes 
multi-modal problems and non-Gaussian sampling. 



In the following Section, we begin by reviewing the 
main ideas behind Importance Sampling in a generic 
Monte Carlo framework. Then in Section III we special- 
ize the discussion to the Gaussian setting, and we briefly 
review the recent adaptive strategies, and the GHS ap- 
proach of Ref.[^. The rationale of LSIS is introduced 
in Section IV together with the essential implementation 
details, and in Section V we present the results of our 
numerical experiments. Here we perform a systematic 
study of the variance reductions obtained by means of 
LSIS for a variety of test cases, including a comparison 
with recent Importance Sampling techniques. Finally, we 
draw our conclusions in Section VI. 



Let us consider the general problem of estimating the 
expectation value of a scalar function, G{Z), depending 
on a d-dimensional real random vector Z = (Zi, . . . , Zd) 
with joint probability distribution P{Z), 



V = Ep [G[Z)] = [ dZ G[Z) P{Z) 
Jd 



(1) 



where D is the domain of possible values of the state 
variables Z [Toj . In a financial derivatives context, 
G{Z) would typically represent the discounted pay- 
out of a certain security, and P{Z) would be the risk 
neutral-probability measure of an arbitrage free mar- 
ket (isl. Il4l. ITsj . For instance, for the^familiar Call op- 
tion in the Black-Scholes framework 2] one has d = 1, 
P{Z) = (27r)-i/2 exp {-Z'^/2) and 



G{Z) 



-,-rT 



Xq exp 



.--IT- 



aZ 



K 



.(2) 

where r is the risk-free interest rate, a is the volatility, 
Xq and K are respectively the spot and strike price, and 
T the maturity of the option. 

Whenever the dimension d of the state variable Z is 
large (say > 4) standard numerical quadrature ap- 
proaches become highly inefficient, and Monte Carlo 
methods are the only feasible route for estimating ex- 
pectation values of the form ([1]). To do so, one interprets 
Eq. (IT|) as a weighted average of the payout function G{Z) 
over the possible configurations Z with weights given 
by the probability distribution P{Z). This immediately 
leads to the simplest (and crudest) Monte Carlo estima- 
tor which is obtained by averaging the payout function 
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FIG. 2: Optimal drift vector as obtained with the LSIS and 
the GHS procedures for an arithmetic Asian CaU option (|15|l. 

= {X - K+), on a lognormal asset (O for M = 16 
a = 0.3, Xo ^ K ^ 50, r ^ 0.05, and T = 1.0. 



over a sample of A'p independent values of the random 
variable Z generated according to the probability distri- 
bution P{Z), 



V 



1 



P{Z) 



(3) 



In particular, the central limit theorem [16j ensures that, 
for big enough samples, the values of the estimator V are 
normally distributed around the true value, and converge 
for Np — > oo towards V namely 



V 



^ z— 1 



± 



(4) 



where = Ep [G{xf] - Ep [G{x)f is the variance of 
the estimator and can be similarly approximated by 



1 

^'^^i:^G{z,)-vf 



(5) 



Although Eq. ([4]) ensures the convergence of the Monte 
Carlo estimator to the expectation value ([T]) , its practical 
utility depends on the magnitude of the variance, . In- 
deed, the square root convergence in (|4]) , implies that the 
number of replications Np that are (asymptotically) nec- 
essary to achieve a given level of accuracy is proportional 
to the variance of the estimator [2^. Roughly speaking, 
such quantity is relatively small whenever the function 
G{Z) is approximately constant over the region of val- 
ues of Z that is represented the most among the random 
samples, i.e., the region that contains most of the prob- 
ability mass of P{Z). This is generally not the case for 
most of the pricing problems encountered in practice, and 
the calculation of accurate estimates of the expectation 
value (dl) may require large sample sizes Np, thus becom- 
ing computationally demanding. 

However, the choice of extracting the random vari- 
able Z according to the probability distribution P{Z), 
although natural, is by no means the only possible one. 



Indeed, the Monte Carlo integration can be performed 
by sampling an arbitrary probability distribution P{Z) 
provided that the integral is suitably reweighed. In fact, 
using the identity 



/ dZ G{Z)P{Z) ^ [ d 

J D J D 



G(Z)P(Z) f,, 



an alternative estimator of the expectation value ([T]) is 
readily found as 



with the weight function given by W{Z) = P{Z)/P{Z). 
The variance of the new Monte Carlo estimator reads 



dZ {W{Z) G{Z) - Vf P{Z) 



(8) 



and critically depends on the choice of the sampling prob- 
ability distribution P(Z). For non- negative functions 
G{Z), the optimal choice of P{Z) is the one for which 
E vanishes, namely: 



PoAZ) = ^G{Z)P{Z) 



(9) 



In fact, the Monte Carlo estimator corresponding to such 
optimal sampling distribution reads 

^^-]v-E^(^OG(Z,0^]^^,., (10) 



leading to a constant value V on each Monte Carlo repli- 
cation, and resulting therefore in zero variance [2l| . Un- 
fortunately, such a choice is not really viable as the nor- 
malization constant, V, is the expectation value ([1]) we 
want to calculate in the first place. 



III. GAUSSIAN FRAMEWORK 

The Importance Sampling approaches proposed in 
the literature usually apply or are generally formulated 
within a Gaussian setting, i.e., in a context where the 
distribution P{Z) defining the expectation value ((1]) is a 
d-dimensional standard normal distribution. For exam- 
ple, this is the case for a Monte Carlo simulation of a 
vector diffusive process of the form [l[ 



dX{t) = n{X{t),t)dt + a{X{t),t)dWt 



(11) 



e.g., for the calculation of the price of an option de- 
pending on the path followed by X{t) within a certain 
time interval [0,T]. Here the precess X{t) and the drift 
li{X,t) are both L-dimensional real vectors, Wt is a A^- 
dimensional standard Brownian motion, and the volatil- 
ity, a{X,t), is a. L X N real matrix. 
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TABLE I: Comparison between LSIS, the adaptive Robbins-Monro (RM) algorithm (as quoted in Ref. [lo[), and the saddle 
point approach of Ref. @] (GSH): price of a European Call option on a lognormal asset (|14p for different values of the volatility 
a, and of the strike price K. The variance reduction (VR) is defined in Eq. (|26|) . The parameters used are r = 0.05, Xq = 50, 
T = 1.0, and the number of simulated paths is 1,000,000 for Crude MC, LSIS and CHS, 50,000 for RM. Results for LSIS obtained 
by optimizing the drift only [LSIS(/i)], and both the drift and the volatility [LSIS(/i, ct)] are reported. The uncertainties on 
the least significative digits of the option prices, and variance reductions are reported in parentheses. Note that in Ref. no 
error estimate was quoted for the RM results. However, from the ratio of the simulated paths, it is sensible to estimate the 
errors on the RM variance reductions as about \/20 ~ 4 times larger than those quoted for LSIS(/i). 







Crude MC 


LSIS(/i) 




LSIS(/i, 






RM 


GHS 




a 


K 


Price 


Price 


VR 


Price 


VR 


Price 


VR 


Price 


VR 


0.1 


30 


21.4633(50) 


21.46294(49) 


104(1) 


21.46296(12) 


1700(100) 


21.47 


112(4) 


21.46294(50) 


100(1) 




50 


3.4032(39) 


3.4019(14) 


7.8(1) 


3.4046(10) 


15(1) 


3.41 


7.8(4) 


3.4018(14) 


7.8(1) 




60 


0.2315(11) 


0.23112(19) 


33.5(5) 


0.23132(12) 


84(5) 


0.23 


31(2) 


0.23126(19) 


33.5(5) 


0.3 


30 


21.598(15) 


21.5912(37) 


16.4(1) 


21.5984(21) 


51(1) 


21.63 


16.8(4) 


21.5973(39) 


14.8(2) 




50 


7.114(11) 


7.1169(35) 


9.9(5) 


7.1159(21) 


27(1) 


7.12 


11(2) 


7.1146(35) 


9.9(1) 




60 


3.4954(83) 


3.4514(21) 


15.6(1) 


3.4523(14) 


35(1) 


3.45 


15.2(4) 


3.4508(22) 


14.2(1) 



TABLE II: Same as Table |T] for a European Put option. 







Crude MC 


LSIS (A) 




LSIS(/i, (T 


) 




RM 


GHS 




a 


K 


Price 


Price 


VR 


Price 


VR 


Price 


VR 


Price 


VR 


0.1 


40 


0.004235(98) 


0.0041650(47) 


435(6) 


0.0041616(41) 


571(9) 


0.0042 


350(24) 


0.0041650(47) 


435(6) 




50 


0.9636(19) 


0.96419(64) 


8.8(1) 


0.96436(38) 


25(2) 


0.97 


9.6(4) 


0.96385(63) 


9.1(1) 




60 


7.3052(46) 


7.3059(19) 


5.9(1) 


7.3056(11) 


17(1) 


7.31 


6.3(4) 


7.3047(19) 


5.9(1) 


0.3 


30 


0.13397(83) 


0.13445(13) 


41(1) 


0.13448(10) 


69(2) 


0.13 


38(4) 


0.013440(13) 


40.8(5) 




50 


4.6794(65) 


4.6761(27) 


5.8(1) 


4.6767(16) 


16.5(5) 


4.68 


6.2(4) 


4.6761(27) 


5.8(1) 




60 


10.5203(97) 


10.5236(44) 


4.9(1) 


10.5266(26) 


13.9(2) 


10.54 


4.8(4) 


10.5223(46) 


4.4(1) 



Continuous time processes of the form (jlip are typi- 
cally simulated by sampling X{t) on a discrete grid of 
points, = to < ti < . . . < tM = T, by means, for 
instance, of a Euler scheme [22] 



(12) 



where Xi — X{ti), AU — ti+i — U, and Zi+i is a iV- 
dimensional vector of independent standard normal vari- 
ates. In this representation, each discretized path for 
the vector process X{t) can be put into a one to one 
correspondence with a set of d = iV x M independent 
standard normal variables Z. Hence, the original prob- 
lem of evaluating the expectation value of a functional of 
the realized path of the process X{t) can be formulated 
as in ([T|). More precisely, G{Z) is given by the discretized 
payout functional, and the probability density is given by 
a d-dimensional standard normal distribution 



P{Z) = {2tt 



(13) 



where Z'^ = Z -Z. 

As a prototypical example of exotic option pricing 
problem, treated as a test case for the most recently pro- 
posed Importance Sampling strategies [1, H, , in the 
following we will consider different arithmetic Asian style 
options under a Black-Scholes log-normal model [15]. In 
this case, the underlying asset can be simulated on the 
time grid relevant for the payout function by means of 



the exact recursion 



Xi+i = Xi exp 



{r-(7y2)AU + ay/AUZ,+i , (14) 



where r is the risk free interest rate, a is the constant 
N X L real volatility matrix, and is a L-dimensional 
vector of components cr^ — '^kj- For an arithmetic 

Asian style claim, the discounted payout function is of 
the form 



GiZ) 



(15) 



where X = {^/M)J2iLiXi: and ^{X) is some function 
of L variables. Clearly, European style options are recov- 
ered in the special case AI — 1. 



A. Gaussian Trial Distributions 

The simplest possible strategy for Importance Sam- 
pling in a Gaussian framework is to try to guide the 
sampled paths towards the most important regions of 
the configuration space (i.e., where the contribution of 
the integrand is the largest), by means of a change of the 
drift terms of the process (fTT|) or . The corresponding 
trial probability density reads 



-d/2 



(16) 
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TABLE III; European Butterfly spread option (|28|l on a lognormal asset (|14|l for different values of the spot price: comparison 
between the saddle point approach of Ref. [gj (GHS), and LSIS with the optimization of both drift and volatility [LSIS(/i, cr)] 
The parameters used are r = 0.1, Ki = 45, K2 ~ 50, K3 = 55, T = 1.0, a = 0.3, and the number of simulated paths is 
1,000,000. 





Crude MC 


GHS 




LSIS(/i, a) 






Price 


Price 


VR 


Price 


VR 


30 


0.15818(69) 


0.15732(33) 


4.4(8) 


0.157659(40) 


298(5) 


40 


0.4871(11) 


0.48666(98) 


1.26(1) 


0.48703(11) 


100(5) 


50 


0.6274(13) 


0.6274(13) 


1.00(1) 


0.62750(11) 


140(1) 


60 


0.5156(12) 


0.5157(10) 


1.44(1) 


0.515636(93) 


166(2) 


70 


0.32961(97) 


0.32875(67) 


2.10(1) 


0.329281(73) 


177(2) 



TABLE IV: Comparison between LSIS and the saddle point approach of Ref. 01 (GSH): price of Asian options (I15|l on a 
lognormal asset (|14[) for different values of the volatility cr, the strike K, and of the number of observation dates M. The 
parameters used are r = 0.05, Xq = 50, T = 1.0, and the number of simulated paths is 1,000,000. 









Crude MC 


LSIS 




GHS 




M 


(7 


K 


Price 


Price 


VR 


Price 


VR 


16 


0.1 


45 


6.0565(29) 


6.05522(88) 


10.86(5) 


6.05537(89) 


10.62(5) 






50 


1.9198(22) 


1.91994(80) 


7.56(5) 


1.91914(83) 


7.03(5) 






55 


0.20272(74) 


0.20235(16) 


21.4(2) 


0.20237(16) 


21.4(2) 


16 


0.3 


45 


7.1545(77) 


7.1531(26) 


8.8(1) 


7.1529(27) 


8.13(1) 






50 


4.1730(63) 


4.1714(20) 


9.9(1) 


4.1712(21) 


9.0(1) 






55 


2.2135(48) 


2.2115(13) 


13.6(7) 


2.2116(14) 


11.8(7) 


64 


0.1 


45 


5.9967(28) 


5.99510(87) 


10.4(5) 


5.99500(85) 


10.9(5) 






50 


1.8467(21) 


1.84522(78) 


7.2(3) 


1.84525(81) 


6.7(3) 






55 


0.17519(67) 


0.17448(14) 


23(2) 


0.17443(14) 


23(1) 


64 


0.3 


45 


7.0257(75) 


7.0204(25) 


9.0(2) 


7.0204(26) 


8.3(2) 






50 


4.0271(61) 


4.0222(19) 


10.3(5) 


4.0220(20) 


9.3(5) 






55 


2.0849(46) 


2.0794(13) 


12.5(5) 


2.0794(13) 


12.5(5) 



where /i is a d-dimensional vector, and the weight func- 
tion, as also expected from the Girsamov theorem [3], 
is 

1¥^(Z) =exp[-/i-Z + MV2] . (17) 

A variety of approaches for the determination of the 
drift vector /i minimizing the variance of the estima- 
tor ([51 has been recently proposed in the literature 
d, S H ■ These can be roughly classified into 
two families depending on the strategy adopted. The 
first strategy, common to the so-called adaptive Monte 
Carlo methods, aims to determine the optimal drift vec- 
tor though a stochastic minimization of the variance. 
Such minimization differs in details in the various meth- 
ods but always involves an iterative procedure, to be per- 
formed in a preliminary Monte Carlo simulation. 

In particular, Su and Fu [1, [§], building upon pre- 
vious work by Vazquez- Abad and Dufresne [5|, used 
a gradient-based stochastic approximation, dubbed in- 
finitesimal perturbation analysis, in order to estimate the 
optimal uniform shift of the drift for the diffusion (|12p . 
minimizing the variance of the estimator ([5]) . In the no- 
tation of this Section, this translates in working with a 
trial density of the form (|16p where the drift vector fl has 
components all equal to a single optimization parameter. 
The improvement of this method with respect to the one 



of Ref. is that the minimization is carried out un- 
der the original probability measure (as we also do for 
LSIS), while in the latter the minimization was formu- 
lated under the trial probability measure. As a result, 
the stochastic minimization applies also for non differen- 
tiable payout, thus making the approach more general. 
The application of this technique to partial average Asian 
options in a Black-Scholes market, and to caplets under 
the Cox-IngersoU-Ross model provides significative vari- 
ance reductions [1, [§] . 

Along similar ideas, Arouna To^ has recently proposed 
a different stochastic optimization method for the deter- 
mination of the optimal sampling density (|16p . Here, in 
contrast to the previous approach, all the components 
of the drift vector are independently optimized. The 
method relies on a truncated version of the Robbins- 
Monro algorithm that is shown to converge asymptot- 
ically to the optimal drift, and to provide an effective 
variance reduction in a variety of cases. However, as re- 
marked by the same author, a critical aspect of the prac- 
tical implementation of the Robbins-Monro algorithm is 
that it depends on the size of the iterative step. Hence, 
a particular care needs to be taken in order for the algo- 
rithm to be efficient [1^. 

On the other hand, the second strategy, proposed by 
Glasserman, Heidelberger, and Shahabuddin (GHS) [a]. 
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TABLE V: Importance Sampling plus Stratification. Comparison between LSIS, and the GHS approach of Ref. for the 
Asian Option of Table |lVl 



M 


a 


K 


T.STS+ SS 

Price 


VR 


OHS + SS 

Price 


VR 


16 


0.3 


45 


7.15284(25) 


950(20) 


7.15266(24) 


1030(10) 






50 


4.17122(18) 


1225(15) 


4.17118(18) 


1225(30) 






55 


2.21183(11) 


1900(100) 


2.21183(11) 


1900(50) 


64 


0.3 


45 


7.02075(23) 


1060(30) 


7.02076(23) 


1060(30) 






50 


4.02251(17) 


1290(30) 


4.02250(17) 


1290(30) 






55 


2.07967(13) 


1320(100) 


2.07965(12) 


1470(100) 



relies on a saddle point approximation to minimize the 
variance of the estimator ([8]) , or equivalently of its second 
moment (in the original measure) 

m2{p)= [ dZW~^{Z)G{Zf P{Z) . (18) 

JD 

In fact, if the payout function G{Z) is positive definite, by 
defining F{Z) = \ogG{Z) one can approximate Eq. (fTS|) 
with the zero-order saddle point expansion 

(27r)-''/2 f exp [2F{Z) - fl ■ Z + - Z^/2] 
Jd 

~ C exp max (2F(Z) 

where C is a constant. As a result, within this approxi- 
mation, the problem of determining the optimal change 
of drift boils down to finding the vector fi such that 

m^x{2F{Z) - fl- Z + Z^/2) (19) 

is minimum. It is easy to show that this is obtained by 
choosing jl* = Z* where Z* is the point that solves the 
optimization problem 



max {F{Z) - Z^ /2) 



(20) 



or equivalently, for which the payout times the origi- 
nal distribution, G{Z)P{Z), is maximum, i.e., Z* corre- 
sponds to the maximum of the optimal sampling density, 
Eq. ([9]). The simplest interpretation of the saddle point 
approach is therefore that it approximates the zero vari- 
ance distribution by means of a normal density with the 
same mode and variance. 

The saddle point approach can be expected to be par- 
ticularly effective in reducing the variance of the Monte 
Carlo estimator whenever the log payout function F{Z) 
is close to be linear in the portion of the configuration 
space where most of the probability mass of P{Z) lays. 
Moreover, GHS have also shown that stratifying 1] the 
d-dimensional random vector Z along the optimal ^, pro- 
vides a spectacular variance reduction in certain cases 
0, 0]- However, whenever the optimal sampling prob- 
ability ([9]) cannot be accurately represented by a single 
Gaussian with the same mode and variance, the saddle 
point approximation is less beneficial. In particular, as it 



will be also illustrated in Sec.|Vl this approach turns out 
to be less effective whenever the structure of the payout 
function G{Z) is such that the optimal sampling distri- 
bution ([9|) has a width which is very different from the 
one of the original distribution, or is multi-modal. 

In the following Section we describe an alternative least 
squares strategy that is straightforward to implement and 
fiexible enough to be applied in a generic Monte Carlo 
setting. Indeed, the Least Squares Importance Sampling 
(LSIS) is not limited to the determination of the opti- 
mal change of drift in a Gaussian model. Instead, it 
can be applied to any Monte Carlo simulation provided 
that a reasonable guess of the optimal sampling density 
is available. For this reason, in the next Section we will 
momentarily leave the Gaussian framework, and we will 
describe the rationale of LSIS in a more general setting. 



IV. LEAST SQUARES IMPORTANCE 
SAMPLING 



A practical approach to the search of an effective Im- 
portance Sampling distribution can be formulated in 
terms of a non-linear optimization problem. To this pur- 
pose, let us consider a family of trial probability densi- 
ties, Pe{Z), depending on a set of Ng real parameters 
9 — {9i,92, ■ ■ ■ ,9Ng) ■ The variance of the estimator cor- 
responding to Pe{Z), Eq. can be written in terms of 
the original probability distribution P{Z) as 



= Ep [We{Z)G''{Z)\ - Ep [G[Z)y 



(21) 



with We{Z) = P{Z)/Pg{Z). Hence, the optimal Impor- 
tance Sampling distribution within the family Po{Z) is 
the one for which the latter quantity, or equivalently the 
second moment (HI 



(22) 



Ep[We{Z)G\Z)\ , 



is minimum. The crucial observation is that the Monte 
Carlo estimator of this quantity, 

2 

^jjrH iwg{z,Y/^G{Zi)) ~ p{z) , 

p i=i 

(23) 
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can be interpreted as a non-linear least squares fit of a 
set of Np data points {xi,yi) with a function y = fg{x) 
parameterized by 6*, with the correspondence yi 0, 
Xi ^ Z^, and fg{x) We{Zy^^G{Z). The latter is a 
standard problem of statistical analysis that can be tack- 
led with a variety of robust and easily accessible numer- 
ical algorithms, as the so-called Levenberg-Marquardt 
method ^l^- 

Alternatively, to improve the numerical stability of the 
least-squares procedure, it is convenient in some situa- 
tions to minimize, instead of (|22p . the pseudo- variance 



S2i0) 



We{zy/^G{Z) ~Vt 



1 / 

— J2[We{Z,)'^'G{Z,)~VT 



(24) 



where the constant Vr is a guess of the option value. 
Indeed, the minimization of (|24p is equivalent to the one 
of the real variance of the estimator (PT|) as 



S2i9) = ±l + iEp[GiZ)]-VTf 



(25) 



The algorithm for the determination of the optimal 
sampling distribution within a certain trial family can 
be therefore summarized as it follows: 

• Generate a suitable number iV^ of replications of 
the state variables Z according to the original prob- 
ability distribution P{Z); 

• Choose a trial probability distribution Pg{Z), and 
an initial value of the vector of parameters 9; 

• Set x^^ Z,, fg{x) ^ Wg{ZY/^G{Z) and ^ 
(resp. yi Vt) and call a least squares fitter, say 
LSQ [x, y, fe{X), 9] , providing the optimal 9 = 9* 
by minimizing the second moment of the estimator 
m2(6'), Eq. ^ [resp. 5*2 (6*), Eq. JH]. 

Once the optimal parameters 9* have been determined 
through the least squares algorithm, one can perform an 
ordinary Monte Carlo simulation by sampling the prob- 
ability distribution P0i,{Z), and calculating expectation 
values according to Eq. 

As it will be illustrated with the numerical examples of 
Sec.|Vl this procedure does not add a significant overhead 
to the simulation, because just a relatively small number 
of replications Np <^ Np is usually required to determine 

the optimal parameter 9* . This is due to the fact that the 
configurations over which the optimization is performed 
are fixed. As a result of this form of correlated sampling 
[l^ , the difference in the m2 (9) 's for two sets of values of 
the parameters being optimized is much more accurately 
determined than the values of the 7712(0) 's themselves. 
This rather surprising feature is rooted in the fact that 
the minimization of Eq. ([23| as a means to optimize the 
trail density, Pg{Z), can be justified in terms of a gen- 
uine maximum likelihood criteria , and it is therefore 
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FIG. 3; Sampling probability density functions for a straddle 
option (O with r = 0.05, Xo ^ K ^ 50 , a = 0.3, T = 1, 
as obtained with LSIS [optimizing just the drift, LSIS(/i), 
both the drift and the volatility , LSIS(/i, cr), and using the 
bi-modal trial density (|31|l . LSISbm], and the saddle point 
approximation (GHS). The original (|13|l and the optimal (|9]) 
sampling densities also shown for comparison. 



independent on how accurately m2{9) approximates the 
quantity (j^ . 

Clearly, the fact that only a limited number of Monte 
Carlo samples is required for the optimization of the trial 
density limits the overhead introduced by the LSIS algo- 
rithm. This makes LSIS a practical strategy for variance 
reduction. 

In the following we will demonstrate the effectiveness 
of this simple strategy by applying it to a variety of test 
cases, and we will compare the present approach with the 
ones most recently proposed in the literature [1, H, |^ ■ 



V. NUMERICAL EXAMPLES 
A. European Options 

We will start by considering European style options. 
These are the simplest possible examples of financial rel- 
evance that allow one to illustrate the LSIS strategy. In 
particular, we will consider first standard Call and Put 
options, written on a single underlying asset, X{t), fol- 
lowing the log normal process ([H]). Here N = M = 
L = I, and Ato = T. The payout function reads as in 
Eq. 115]) with X Xi= X{T), ^{X) = {X - K)+ and 
$(X) = {K-X)+ for the Call and the Put, respectively, 
and K is the strike price. 

In these cases, the probability distribution P{Z) is a 
univariate Gaussian distribution, and the option value 
([T|) involves the integration of a function of a single vari- 
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able. As discussed in Sec. [TTl Importance Sampling tech- 
niques seek a sampling probability density Pe{Z) as close 
as possible to the optimal sampling distribution, Eq. ([9|) 
(see Figure [T]). The simplest choice for Pe{Z), in this 
setting, is a Gaussian distribution of the form (|16p (with 
d — 1), so that the only parameter 9 to optimize is the 
drift fl. 

The LSIS algorithm described in Sec. |TV] is particu- 
larly straightforward to implement, and involves a pre- 
simulation stage in which a set of standard normal 
random numbers are generated and stored, and a least 
squares optimization routine is invoked. As anticipated, 
we found that the least squares fitter was able to deter- 
mine successfully the optimal p, with as little as ~ 50 
Monte Carlo replications, thus making the approach use- 
ful in practice. 

In Tables |T] and |TT] we compare the results obtained 
with LSIS with the ones obtained by means of the Rob- 
bins Monro (RM) adaptive Monte Carlo (as quoted in 
Ref. [lOj), and the saddle point approach of CHS @. 
Here, as an indicator of the efficiency gains introduced 
by the different strategies of Importance Sampling, we 
have defined the variance ratio as 



^^^^.(Cr^KicMOy 
(t(IS) J 



(26) 



where the numerator and denominator are the statistical 
errors (for the same number of Monte Carlo paths) of the 
Crude and the Importance Sampling estimators, respec- 
tively. In particular, for this and the following examples, 
we have estimated the uncertainties on the statistical er- 
rors, and on the VR's by performing a standard error 
analysis of the outcome of several Monte Carlo simula- 
tions for different random number seeds. 

We found that the different methods produce a signi- 
ficative and comparable variance reduction. In particu- 
lar, as it is intuitive, the change of drift is more effective 
for low volatility, and deep in and out of the money op- 
tions. In this case, LSIS and CHS optimized trial dis- 
tributions Pp (Z) are very similar as shown Fig. [1] This 
could be expected as, in this case, the optimal Impor- 
tance Sampling distribution ^ can be effectively approx- 
imated by a Gaussian with the same mode and variance, 
so that the CHS approach produces accurate results. 

However, the LSIS method is not limited to Impor- 
tance Sampling strategies based on a pure change of drift, 
and one can easily introduce additional optimization pa- 
rameters in the trial density. For instance, in this exam- 
ple it makes sense to introduce the sampling volatility, 



(27) 



As illustrated in Fig. [U by adjusting both p, and a, one 
obtains a trial density closer to the optimal one. This 
corresponds to an additional variance reduction up to 
over one order of magnitude, as shown in Tables [J and 

im 



Another simple example in which the ability to adjust 
the width of the sampling distribution proves effective is 
the Butterfly spread option, 



$(X) = iX- Ki)+ - 2{X - K2)- + {X 



(28) 



with Ki < K2 < Ks- In this case, a pure change of 
drift can only force the expected end point of the sam- 
pled paths X{T) to fall in the money, Ki < X{T) < K3. 
However, for values of the volatility or the maturity large 
enough, a significative fraction of the sampled paths will 
still results in a zero payout. Hence, a pure change of drift 
is not very effective in reducing the variance. This is il- 
lustrated in Table [TTTl Instead, by quenching the sampled 
volatility one can increase the number of paths falling in 
the money thus reducing the variance. For instance, in 
the example of Table IIIIl for Sq = 50 the saddle point 
change of drift is very small, and does not alter the vari- 
ance. Instead, by allowing the sampling volatility to vary, 
we get fl* ~ —0.02 and a* ~ 0.14, reducing the variance 
by more than two orders of magnitude. 




FIG. 4: Optimal sampling distribution, Eq. (|9]), for the Euro- 
pean basket Call option ^ for r = 0.05, K = 100, = 100, 
Xl = 105, ai = (72 = 0.3, T = 1. 



B. Asian Call option 

As a second example we consider here the single un- 
derlying (i = 1) arithmetic Asian option Eq. (|15p . with a 
Call payout, $(Z) = e"''^ {X ~ K)+, and M observation 
dates. 

In this case, the calculation of the expectation value 
(fT|) involves a multi-dimensional integral over the prob- 
ability distribution (jl3|) . However, the implementation 
of LSIS is not more difficult than for the European pay- 
out. Indeed, the fact that the sample points Zi are in 
the present case M-dimensional vectors is practically ir- 
relevant for the LSIS procedure. All the least square fit- 
ter needs is a method that, given a configuration Zi and 
the vector 6*, provides the value of the target function 
je{Z,) = WeiZ^y^^GiZ,). 
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TABLE VI: Comparison between LSIS, the adaptive Robbins-Monro (RM) method (as quoted in Ref. [T3|)i for the price of 
an Asian Call option (|15p on a lognormal asset (|14p for different values of the volatility ct, and of the strike price K. The 
parameters used are r — 0.1, Xq — 50, T — 0.5, and the number of simulated paths is 1,000,000 for Crude MC and LSIS, 
800,000 for RM. The uncertainties on the RM variance reductions are rough estimates based on the number of simulated paths. 







Crude MC 


LSIS 




RM 


M 


a 


Price 


Price 


VR 


Price 


VR 


20 


0.1 


40 10.7820(21) 


10.78243(22) 


91(2) 


10.73 


58(2) 






50 1.6045(16) 


1.60398(61) 


6.9(1) 


1.53 


7.0(1) 






55 0.04919(29) 


0.049204(54) 


28.8(5) 


0.037 


6.0(5) 




0.3 


40 10.8284(62) 


10.8277(17) 


13(1) 


10.77 


14(1) 






50 3.1373(44) 


3.1352(15) 


8.6(1) 


2.99 


9.0(1) 






60 0.3790(16) 


0.37861(35) 


20.9(5) 


0.320 


28.0(5) 


TABLE VIL Comparison 


I between LSIS, the adaptive 


approach of Su and Fu (SF) (as quoted 


in Ref. [8]), 


for the price of a 


partial average Asian Call option (|15|l and (|29|l on a 


lognormal asset (|14|l for different values of the volatility a, and of the 


strike 


price K. The parameters used are r = 0.05, Xo = 


= 100, T = 1.0, M = 


365, Mo — 305, and the number of simulated paths 


is 1,000,000 for Crude MC, and LSIS, and 50,000 for SF. The uncertainties 


on the SF variance 


reductions are rough estimates 


based 


on the number of simulated paths. 














Crude MC 


LSIS 






SF 


a 


K 


Price 


Price 


VR 


Price 


VR 


0.2 


100 


9.792(14) 


9.7816(46) 


9.3(1) 


9.747 


6.7(3) 




110 


5.413(11) 


5.4256(33) 


11.1(1) 


5.397 


8.2(3) 




120 


2.7758(77) 


2.7669(20) 


14.8(2) 


2.730 


11.0(6) 




130 


1.3038(52) 


1.3096(12) 


18.8(1) 


1.284 


17.0(4) 




140 


0.5861(35) 


0.58298(61) 


33(1) 


0.575 


25(3) 




150 


0.2421(22) 


0.24592(30) 


54(4) 


0.241 


44(12) 




160 


0.1003(14) 


0.09973(14) 


100(8) 


0.0980 


85(20) 




170 


0.03761(84) 


0.039000(64) 


172(20) 


0.0380 


173(80) 


0.3 


100 


13.323(21) 


13.3527(65) 


10.4(5) 


13.295 


7(1) 




110 


9.182(18) 


9.1564(54) 


11.1(5) 


9.103 


8(1) 




120 


6.1041(15) 


6.1258(39) 


14.8(5) 


6.059 


10(1) 




130 


4.020(12) 


4.0034(31) 


15.0(5) 


3.985 


12(3) 




140 


2.5624(98) 


2.5769(23) 


18.1(7) 


2.556 


15(2) 




150 


1.6410(80) 


1.6325(13) 


38(1) 


1.603 


22(2) 




160 


1.0143(62) 


1.02355(90) 


47(2) 


1.006 


30(5) 




170 


0.6408(50) 


0.63681(61) 


67(3) 


0.623 


36(6) 



sity Popt{Z) oc G{Z)P{Z), with covariance close to one. 
Hence, as discussed in the previous Section, the opti- 
mal sampling density can be accurately represented by 
a Gaussian distribution, and the saddle point method 
provides an accurate approximation of the optimal drift 
vector. Indeed, as shown Fig. [21 the optimal drift vec- 
tors obtained with the LSIS and the GHS approaches 
are quite similar, with an overlap of ~ 0.99 when both 
normalized to one. 

As anticipated, we found that a few hundreds of Monte 
Carlo configurations and 10-20 iterations of the least 
squares fitter, were typically enough to determine the 
optimal drift vector in (flB)) for AI ~ 50, thus making 
the overhead of the pre-simulation stage rather limited. 
However, as the number of time steps - or more in gen- 
eral the number of components of the drift vector - in- 
creases, the complexity of the optimization problem in- 
creases as well. Nevertheless, as suggested in Ref. Q, 
one can significantly reduce the computation time asso- 
ciated with the optimization stage by approximating the 



In order to compare LSIS with the other approaches 
proposed in the literature we have considered a Gaussian 
trial density of the form Eq. Here such density 

depends on as many parameters, /2 = {jiQ, . . . , jiM-i), 
as are the sampling dates in the Euler discretization of 
the process ([T^. A typical outcome of the optimized 
drift vector produced by the LSIS procedure is shown in 
Fig.H 



1. Comparison with the Saddle Point Approximation 

As shown in Tabic IIVI LSIS provides a very effective 
variance reduction for the Asian Call options considered, 
similar to the one obtained with the approach of GHS. 
This could be expected on general grounds as in the 
case of the European Call and Put options. Indeed, it 
is not difficult to verify that the single asset Asian Call 
option generates a single-mode optimal sampling den- 



10 



TABLE VIII: Importance Sampling results for the European style straddle (I30|l obtained by means of the saddle point ap- 
proximation (GHS) and LSIS, with the optimization of the drift [LSIS(/i)], and the drift and the volatility [LSIS(/i, a)]. The 
parameters used are r = 0.05, Xq = K — 50, a = 0.3, T = 1, and correspond to the probability distributions sketched in Fig.|3] 
The number of simulated paths is 1,000,000. 



Crude MC 


GHS 




LSIS (A) 




LSIS(/i, a) 




Price 


Price 


VR 


Price 


VR 


Price 


VR 



11.803(10) 11.800(33) 0.10(1) 11.8038(88) 1.30(1) 11.8047(55) 3.00(2) 



drift vector with a continuous function parameterized by 
a small number of parameters. These are in turn tuned 
by the least square algorithm in order to determine an 
approximate optimal drift vector. We have found that a 
particularly effective realization of this approach is to ap- 
proximate the drift vector by a piecewise linear function, 
parameterized by its values where it changes slope (the 
so-called knot points). For instance, in the present exam- 
ples, optimizing over 4 to 8 equally spaced knot points 
provides variance reductions practically indistinguishable 
from the ones obtained by a full optimization. In addi- 
tion, the drift vector generally changes continuously with 
the market parameters. As a result, an additional com- 
putational saving can be obtained by starting the pre- 
simulation from a drift vector optimized for a case with 
a similar set of parameters. 

As anticipated in Sec. IIII A| Importance Sampling can 
be also naturally combined with Stratified Sampling by 
choosing the optimal drift vector as the stratification di- 
rection (for further details see Ref.P]). This, as shown in 
Table fVl gives rise to a further reduction of the variance, 
which can be of several order of magnitudes, depending 
on the option parameters, thus resulting in quite remark- 
able savings in computational time. 



2. Comparison with Adaptive Monte Carlo approaches 

It is also interesting to compare the LSIS results with 
those obtained by means of the recently proposed adap- 
tive Monte Carlo methods briefly discussed in Sec. IIII Al 
As illustrated in Table IVIl we found that our approach 
achieves variance reductions similar to the ones obtained 
with the Robbins-Monro algorithm of Ref . . However, 
for the example considered, LSIS appears to perform bet- 
ter for small values of the volatility and deep in and out of 
the money options. This is particularly remarkable when 
considering that the pre-simulation stage required by the 
Robbins-Monro algorithm involves sampling a number of 
Monte Carlo paths much larger than that required by the 
LSIS approach. For instance, in the present example, the 
number of iterations quoted in ReL [10] is 400,000 while 
the LSIS results were obtained by sampling just ~ 400 
configurations. In addition, LSIS does not have any ex- 
ogenous parameter to be fine tuned in order to achieve an 
efficient optimization, thus making the approach easier to 
be automated. 



Finally, in order to compare with the adaptive Monte 
Carlo approach of Su and Fu [1, Q , we have slightly mod- 
ified the Asian option example and considered a payout 
depending on a partial average 

1 

X = y X, , (29) 

M - Mo ^ ^ ^ 

i=Mo 

where Mq is the time step index corresponding to the first 
observation. The results we have obtained in this case are 
reported in Table rvTTl and show that LSIS generally out- 
performs the approach of Ref. [1, Q . This is expected as 
LSIS involves a more general class of sampling distribu- 
tion (fT6| as the change of drift is not constrained to be 
uniform along the simulation path. 



C. Multi- modal Examples 

All the examples considered in the previous Sections 
were characterized by a single mode optimal sampling 
distribution In these cases, an Importance Sampling 
strategy based on the Gaussian trial distributions (fT6| or 
(P7)) provides good results. In this Section we will con- 
sider instead option pricing problems that are character- 
ized by multi- modal optimal sampling densities. These 
are very common in practice, and typically arise if the 
payout function G{Z) is not monotonic, or for claims 
written on multiple assets. We will show that in these 
cases a simple Importance Sampling strategy based on a 
pure change of drift of a Gaussian density proves ineffec- 
tive. In contrast, LSIS allows one to work with trial den- 
sities tailored to the problem at hand, including multi- 
modal distributions. As in Sect ion IV^ we begin by illus- 
trating the effectiveness of LSIS with a simple European 
style payout. 

1 . Straddle 

A simple example of a pricing problem characterized by 
a multi-modal sampling density is the European straddle: 

$(X) = iX - K)+ + {K - X)+ . (30) 

As illustrated in Fig. [31 the corresponding optimal sam- 
pling distribution, Popt{Z) cx G{Z) exp (-Z2/2), has two 
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well separated maxima. As a result, saddle point ap- 
proaches based on a Gaussian distribution, e.g., centered 
on the higher of the two modes, provide a poor approxi- 
mation of the optimal sampling density. Indeed, as shown 
in Tabic rvTm the saddle point method increases the vari- 
ance with respect to the crude Monte Carlo estimator. 
On the contrary, LSIS based on the simple trial density 
provides a small shift of the drift towards the higher 
mode of the optimal distribution [Fig. [3]. This can be 
easily understood. In fact, the best way to approximate 
a bi-modal distribution by means of a Gaussian density 
is to center the latter around the center of mass of the 
former, in this case jl ~ 0.24. Yet, since the two maxima 
of the optimal distribution are well separated, its overlap 
with the trial density is still rather poor. As a result, 
LSIS based on a trial density of the form (fTB)) produces 
only a small variance reduction. Better results can be ob- 
tained by adjusting the width of the trial density. Indeed, 
as also illustrated in Fig. [3l by increasing the standard 
deviation of the sampling density one achieves a better 
approximation of the optimal distribution. This leads to 
a sizable reduction of the variance of a factor of 3. 

An even better ansatz for the optimal density is clearly 
represented by a bi-modal trial density of the form 



P{Z) = (27r)-'^/2 w^e-^^-^- 



Wb e 



(31) 



where Wa + Wh — 1 (and d = 1 in this specific case) 
that can be optimized over /Xa, /Xf,, and Wa- The simu- 
lation of a density of this form is straightforward as it 
simply implies choosing on each Monte Carlo step one 
of the two Gaussian components in (|3ip . and sample a 
configuration according to it. This can be done by ex- 
tracting an auxiliary uniform random number ^ e [0, 1], 
and extracting Zi according to the first Gaussian com- 
ponent if ^ < Wa, and according to the second otherwise. 
The optimized distribution obtained using this bi-modal 
trial density is sketched in Fig. [31 and corresponds to 
a further sizable reduction of the variance (Table IVIIip . 
Remarkably, the same form of sampling density (|3ip can 
be used also for Asian style straddles (here d = AI , fia 
and Hb are M-dimensional vectors), and produces a very 
effective variance reduction, as illustrated in Table IIXI 



TABLE IX: Variance Reduction obtained with LSIS for a Eu- 
ropean (M = 1), and Asian style (M = 16 and 64) strad- 
dle (|3Up using the bi-modal sampling distribution (|31|l . The 
parameters used are r = 0.05, Xo — K = 50, a = 0.3, 
T — 1, and correspond for M = 1 to the probability dis- 
tributions sketched in Fig. [S] The number of simulated paths 
is 1,000,000. 





Crude MC 


LSIS 




M 


Price 


Price 


VR 


1 


11.803(10) 


11.8009(44) 


5.17(5) 


16 


7.0604(58) 


7.0559(26) 


4.98(5) 


64 


6.8200(55) 


6.8163(26) 


4.47(5) 



TABLE X: Importance Sampling results for the Asian style 
basket option (|32p obtained by means of LSIS, using the bi- 
modal sampling distribution ((STJ. The parameters used are 



r = 0.05, r = 1, X^^' = 100, X^-"' = 105, ai = 0.3, era = 0.3, 
and correspond (for K = 100 and M = 1) to the probabihty 
distributions sketched in Fig. 2] The number of simulated 
paths is 500,000. 



(2) 







Crude MC 


LSIS 




M 


K 


Price 


Price 


VR 


1 


90 


34.555(41) 


34.587(14) 


8.58(7) 




100 


26.621(39) 


26.589(15) 


6.76(3) 




110 


19.749(36) 


19.776(13) 


7.7(1) 


16 


90 


24.876(24) 


24.8700(87) 


7.6(1) 




100 


16.428(22) 


16.4081(81) 


7.4(1) 




110 


9.711(19) 


9.6918(68) 


7.8(1) 


32 


90 


24.537(23) 


24.5539(84) 


7.50(5) 




100 


16.082(22) 


16.0734(82) 


7.2(1) 




110 


9.381(18) 


9.3752(68) 


7.0(1) 



2. Basket Call Options 

Basket options are another very common class of con- 
tingent claims that can give rise to multi-modal optimal 
distributions. This is illustrated in Fig.[¥]for a very sim- 
ple European style Call option on the maximum of L = 2 
underlying assets following the process (fT4|) 



$(X) = (max(Xi,X2) -if)+ 



(32) 



Also in this case, LSIS based on a bi-modal trial density 
of the form (|3ip provides a significative variance reduc- 
tion that persists when introducing Asian features in the 
payout (see Table |X|. We obtained similar results for 
other multi assets options by generalizing the form of 
the trial density (|3ip. 



VI. CONCLUSIONS 

In this paper we have described a simple Importance 
Sampling technique based on a least squares minimiza- 
tion of straightforward implementation. The result- 
ing strategy, dubbed Least Square Importance Sampling 
(LSIS) , provides an effective variance reduction technique 
that lends itself to a variety of applications. 

We have presented several numerical examples in a 
diffusive setting, and we have shown that LSIS, when 
restricted to the optimization of the drift of a diffusion 
process, provides variance reductions similar to those ob- 
tained with existing approaches H, [l3|. However, 
LSIS is not limited to the determination of the optimal 
mean of a Gaussian sampling distribution. As a result, it 
outperforms other approaches when the ability to adjust 
the width of such distribution, or to sample non-Gaussian 
and multi-modal densities, is important to achieve vari- 
ance reductions. 

The LSIS strategy can be applied to any Monte Carlo 
setting provided that a reasonable ansatz for the opti- 
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mal sampling distribution is available. This makes LSIS 
a flexible Importance Sampling approach, that can be 
used across a variety of financial applications, ranging 
from Value at Risk (VaR) estimation, to portfolio credit 
risk management. This is currently the object of further 
investigations. 



discussions, and Paul Glasserman for an enlightening lec- 
ture that inspired this work. The opinion and views ex- 
pressed in this paper are uniquely those of the author, 
and do not necessarily represent those of Credit Suisse 
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